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Abstract 

This  project  addressed  the  statistical  inverse  problem  of  reconstruction  of  an  uncertain 
shape  of  a  scatterer  or  properties  of  a  medium  from  noisy  observations  of  scattered 
wavefields.  The  Bayesian  solution  of  this  inverse  problem  yields  a  posterior  pdf^ 
requiring  the  solution  of  the  forward  wave  equation  to  evaluate  the  probability  of 
any  point  in  parameter  space.  The  standard  approach  is  to  sample  this  pdf  via 
an  MCMC  method  and  then  compute  statistics  of  the  samples.  However,  standard 
MCMC  methods  view  the  underlying  parameter-to-observable  map  as  a  black  box, 
and  thus  do  not  exploit  its  structure,  hence  becoming  prohibitive  for  high  dimensional 
parameter  spaces  and  expensive  simulations. 

A  Langevin-accelerated  MCMC  method  for  sampling  high-dimensional  PDE-based 
probability  densities  was  developed.  The  method  builds  on  previous  work  in  Langeviii 
dynamics,  which  uses  gradient  information  to  guide  the  sampling  in  useful  directions, 
improving  convergence  rates.  The  Langevin  idea  was  extended  to  exploit  local  Hes¬ 
sian  (of  the  negative  log  posterior)  information,  leading  to  a  stochastic  version  of 
Newton’s  method.  Fast  Hessian  approximations  were  developed  for  several  inverse 
scattering  problems.  Applications  to  model  inverse  medium  scattering  problems  in¬ 
dicated  several  orders  of  magnitude  improvement  over  a  reference  black-box  MCMC 
method. 

1.  Approach 

The  overall  goal  of  this  project  has  been  to  create  systematic,  rigorous,  and  scal¬ 
able  algorithms  for  quantifying  uncertainties  in  inverse  wave  scattering  problems. 
These  uncertainties  reflect  our  incomplete  knowledge  of  the  medium  in  which  the 
waves  propagate  (inverse  medium  scattering  problem)  or  the  shape  of  a  scatterer  (in¬ 
verse  shape  scattering  problem).  The  problem  of  inferring  an  uncertain  medium  or 
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shape  from  observations  of  scattered  wavefields  is  fundamentally  a  statistical  inverse 
problem.  Our  lack  of  knowledge  results  from  noisy  measurements,  sparse  observers, 
uncertain  forward  models,  and  uncertain  prior  model  parameter  information.  Uncer¬ 
tainty  in  the  reconstructed  model  parameters  is  a  fundamental  feature  of  ill-posed 
inverse  problems. 

The  deterministic  approach  to  the  inverse  scattering  problem,  which  amounts  to  min¬ 
imizing  a  regularized  data  misfit  function,  is  incapable  of  accounting  for  uncertainties 
in  the  solution  of  the  inverse  problem.  Bayesian  inference  provides  a  systematic 
framework  for  incorporating  uncertainties  in  observations,  forward  models,  and  prior 
knowledge  to  quantify  uncertainties  in  the  model  parameters  [29,32].  Suppose  the 
relationship  between  output  observables  y  (such  as  waveforms  at  sensor  locations) 
and  uncertain  model  parameters  p  (such  as  those  describing  a  wave  speed  of  a  hetero¬ 
geneous  medium  or  shape  of  a  scatterer)  is  denoted  by  y  =  /(p,  e),  where  e  represents 
noise  due  to  measurement  and/or  modeling  errors.  In  other  words,  given  the  model 
parameters  p  and  noise  e,  the  function  /(p,  e)  solves  the  forward  (acoustic,  elastic, 
or  electromagnetic  wave  propagation)  problem  to  yield  y,  the  predicted  outputs  at 
the  measurement  locations  (and  time  instants).  Suppose  also  that  we  have  the  prior 
probability  density  7rprior(p)5  which  encodes  the  confidence  we  have  in  prior  infor¬ 
mation  on  the  unknown  model  parameters  (i.e.  independent  of  present  observations), 
and  the  likelihood  function  7r(yobs|p)5  which  describes  the  conditional  probability  that 
the  model  parameters  p  gave  rise  to  the  actual  measurements  yobs-  Then  Bayes’  the¬ 
orem  of  inverse  problems  expresses  the  posterior  probability  density  of  the  model 
parameters,  VTpostj  given  the  data  yobs^  as  the  conditional  probability 

^post  (p)  =  7r(p|pobs)  OC  7rprior(p) '7r(yobs|p)-  (1) 


Expression  (1)  provides  the  statistical  solution  of  the  inverse  problem  as  a  probability 
density  for  the  model  parameters  p. 

As  a  specific  example,  suppose  the  noise  is  additive  and  is  modeled  as  Gaussian 
with  zero  mean  and  a  covariance  matrix  Cnoise?  and  suppose  the  prior  density  of  the 
model  parameters  is  represented  as  Gaussian  with  Pprior  as  the  mean  and  Cpnor  as  the 
covariance  matrix,  then  the  posterior  probability  density  of  the  model  parameters  is 
given  explicitly  (within  a  normalizing  constant)  by 


7rpost(p)  OC  exp 


-^(/(p) -yobs)  c, 


Tn-l 

noise 


(/(p)  Pobs)  2^^  Pprior)  C’priorCP 


Pprior) 


(2) 

This  latter  expression  shows  that  even  when  the  prior,  measurement,  and  modeling 
uncertainties  are  Gaussian,  the  posterior  density  of  the  model  parameters  is  gener¬ 
ally  not  Gaussian,  due  to  the  nonlinearity  of  the  parameter-to-observable  map,  /(p). 
However,  this  expression  exposes  a  significant  connection  between  statistical  and  de¬ 
terministic  inversion.  Suppose  we  wish  to  find  the  value  of  the  most  likely  model 
parameters,  by  maximizing  the  posterior  density  (2).  This  is  equivalent  to  minimiz¬ 
ing  the  negative  argument  of  the  exponential  function — ^which  is  precisely  the  misfit 
function  that  is  minimized  by  deterministic  inverse  methods,  provided  we  interpret 
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the  prior  as  a  regularization  and  weigh  the  data  misfit  by  the  inverse  noise  covari¬ 
ance.  Moreover,  it  is  straightforward  to  show  that  the  inverse  of  the  Hessian  matrix 
of  the  deterministic  misfit  function  approximates  the  covariance  matrix  of  the  poste¬ 
rior  density  (the  equivalence  is  exact  when  f{p)  is  linear).  This  connection  between 
the  Hessian  operator  of  the  deterministic  inverse  problem  and  the  inverse  covariance 
matrix  of  the  statistical  inverse  problem  is  crucial,  and  we  believe  is  a  key  (and  un¬ 
exploited)  idea  in  overcoming  the  curse  of  dimensionality  associated  with  uncertainty 
quantification  in  inverse  problems. 

While  it  is  easy  to  write  down  expression  such  as  (1)  or  (2)  for  the  posterior  probability 
density,  making  use  of  these  expressions  poses  a  challenge,  because  the  posterior 
probability  density  is  a  surface  in  high  dimensions  (equal  to  the  number  of  model 
parameters  p),  and  because  the  solution  of  the  forward  problem  (i.e.,  computing  f{p) 
given  p)  is  required  to  evaluate  the  probability  of  any  point  in  parameter  space. 
Straightforward  grid-based  sampling  is  out  of  the  question  for  anything  other  than 
a  few  parameters  and  cheap  forward  simulations.  Special  sampling  techniques,  such 
as  Markov  chain  Monte  Carlo  (MCMC)  methods,  have  been  developed  to  generate 
sample  ensembles  that  typically  require  many  fewer  points  than  grid-based  sampling 
[21,29,32,33]. 

In  particular,  Metropolis-Hastings  (M-H)  methods  employ  a  given  probability  density 
q{pk,y)  a,t  each  sample  point  in  parameter  space  pf.  to  generate  a  proposed  sample 
point  y.  Once  generated,  the  M-H  criterion  chooses  to  either  accept  or  reject  the 
proposed  sample  point  depending  on  its  probability  relative  to  the  probability  of  the 
current  point  ,  and  repeats  from  the  new  point,  thereby  generating  a  chain  of  samples 
from  the  posterior  density  7rpost(p)-  For  example,  a  popular  choice  for  the  proposal 
density  is  the  isotropic  Gaussian  q{pk,  y)  =  exp[— i(pfc  — y)^];  the  resulting  method 
is  known  as  Random  Walk  Metropolis.  Though  easy  to  sample  from,  this  choice  of 
proposal  function  is  not  tailored  to  the  structure  of  the  underlying  posterior  proba¬ 
bility,  and  is  therefore  typically  slow  to  converge.  MCMC  methods  such  as  Random 
Walk  Metropolis  become  prohibitive  as  the  complexity  of  the  forw'ard  simulation  and 
the  dimension  of  the  parameter  space  increase.  When  the  model  parameters  repre¬ 
sent  a  (suitably-discretized)  field  (such  as  scatterer  shape  or  medium  wave  speed),  and 
when  the  forward  PDE  requires  hours  to  solve  on  a  parallel  computer  (such  as  mid- 
to-high  frequency  3D  wave  propagation),  the  MCMC  framework  becomes  completely 
intractable. 

The  central  difficulty  in  scaling  up  conventional  MCMC  for  large-scale  forward  sim¬ 
ulations  and  high-dimensional  parameter  spaces  is  that  this  is  a  purely  black-box  ap¬ 
proach,  i.e.  it  does  not  exploit  the  structure  of  the  parameter-to-observable  map  f{p). 
Twenty  years  of  advances  in  algorithms  for  deterministic  large-scale  PDEl-constrained 
optimization  have  taught  us  that  making  maximal  use  of  derivative  information  can 
greatly  speed  up  the  search  process  for  extremum  points  (e.g.  [7,11,12,13,27,28]). 
In  this  project  we  have  aimed  to  overcome  the  intractability  of  conventional  meth¬ 
ods  for  statistical  inverse  scattering  problems  by  developing  scalable  algorithms  that 
exploit  the  structure  of  inverse  wave  propagation  parameter-to-observable  operators 
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/(p).  We  have  focused  on  developing  preconditioned  Langevin  methods  that  exploit 
this  structure  to  greatly  speed  up  sampling  (Section  2).  A  crucial  aspect  of  the  suc¬ 
cess  of  preconditioned  Langevin  methods  are  fast  algorithms  for  approximating  the 
inverse  Hessian  for  inverse  wave  scattering  problems  (Section  3). 

2.  Fast  Hessian-preconditioned  Langevin  samplers 

We  have  developed  fast  sampling  methods  that  build  on — and  significantly  extend — 
ideas  from  Langevin  dynamics,  which  use  gradient  information  to  accelerate  sampling 
of  a  target  density,  e.g.  [10,31].  The  Langevin  equation  is  a  stochastic  differential 
equation, 

dPt  =  A  V  log  TTpostdf  +  V2A^/'^dWt,  (3) 

with  7rp„st(p)  as  an  invariant  density.  Here,  Wt  is  the  i.i.d.  vector  of  standard  Brow¬ 
nian  motions.  Preconditioning  by  a  symmetric  positive  definite  operator  A  preserves 
the  invariance  of  the  density.  In  practice,  we  discretize  in  time  with  timestep  At, 
yielding  (e.g.  for  explicit  Euler)  the  update 

Pk+I  =  Pk  +  log  TTpost  At  -I-  y/2AiA^^'^M{0, 1)  (4) 

where  A^(0,/)  is  the  i.i.d.  standard  normal  density.  Discretization  in  time  can  add 
bias,  so  we  use  the  Langevin  steps  as  proposals  for  a  Metropolis-Hastings  MCMC 
method.  The  form  (4)  shows  immediately  the  connection  with  deterministic  opti¬ 
mization  methods:  the  gradient  term  VlogTTpost  is  a  steepest  ascent  direction  for  the 
posterior  density.  In  its  absence  (and  in  the  absence  of  preconditioning,  i.e.  A  =  I) 
we  recover  a  Gaussian  random  walk  from  the  last  term  in  (3).  The  addition  of  the 
gradient  term  drives  the  sampling  in  (the  locally  steepest)  direction  of  higher  prob¬ 
ability.  While  bringing  in  gradient  information  to  accelerate  sampling  is  attractive, 
we  know  that  steepest  descent  is  a  poor  choice  for  search  directions  in  large-scale 
optimization  (particularly  for  anisotropic  pdfs),  and  we  seek  to  improve  on  it. 

Taking  the  preconditioner  A  as  the  inverse  of  the  Hessian  of  logTTpost?  we  obtain  the 
stochastic  equivalent  of  Newton’s  method.  In  the  common  case  of  Gaussian  additive 
noise  and  prior,  the  (negative)  log  of  the  posterior  density  is  simply  the  “regular¬ 
ized”  misfit  function  (the  sum  of  the  data  misfit  and  prior/regularization  term)  that 
deterministic  inverse  methods  seek  to  minimize.  Thus,  similar  to  Newton’s  locally- 
quadratic  approximation  of  the  objective,  the  Hessian-preconditioned  Langevin  step 
makes  a  locally-Gaussian  approximation  of  TTpost-  This  endows  the  sampling  process 
with  information  on  the  curvature  of  the  posterior  density  surface,  which  is  crucial  in 
high  dimensions.  This  results  in  a  need  for  substantially  fewer  sampling  points  rela¬ 
tive  to  a  black-box  MCMC  method,  just  as  deterministic  Newton  requires  substan¬ 
tially  fewer  iterations  to  find  the  optimum  compared  to  a  derivative-free  optimization 
method. 

Moreover,  it  can  be  shown  [20]  that  in  the  limiting  case  when  the  posterior  density 
TTpost  is  in  fact  Gaussian  (e.g.  when  the  inverse  problem  is  linear  and  the  noise  is 
additive  and  Gaussian),  this  so-called  stochastic  Newton  method  not  only  samples  the 
target  density  at  long  times,  but  accurately  samples  from  TTpost  at  every  time  step.  This 
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MPSRF  Plots  tor  65D  Chains 


Figure  1:  Comparison  of  number  of  points  taken  for  sampling  posterior  density  for  a  ID  inverse 
scattering  problem  to  identify  the  distribution  of  the  elastic  modulus  of  a  heterogeneous  medium 
for  a  65-dimensional  discretization  of  the  medium.  DRAM  (black),  unpreconditioned  Langevin 
(blue),  and  Stochastic  Newton  (red)  sampling  methods  are  compared.  Convergence  indicator  is 
multivariate  potential  scale  reduction  factor  (MPSRF  [18]),  for  which  a  value  of  unity  indicates 
convergence.  Stochastic  Newton  requires  over  three  orders  of  magnitude  fewer  sampling  points. 


means  that  the  Metropolis-Hastings  criterion  will  accept  all  of  the  proposed  sample 
points,  and  that  a  minimum  number  of  points  are  needed  to  accurately  sample  from 
the  given  distribution.  For  densities  that  are  not  Gaussian,  stochastic  Newton  will 
still  provide  a  substantial  speedup  over  a  conventional  random  walk,  since  a  Gaussian 
approximation  (based  on  a  local  quadratic  approximation  of  logTTpost)  or  equivalently 
a  linearized  approximation  of  the  inverse  problem)  will  generally  yield  more  useful 
information  on  the  behavior  of  TTpost  than  a  standard  normal  density  approximation 
(or  other  heuristic)  will. 

We  have  developed  an  implementation  of  the  stochastic  Newton  method,  and  applied 
it  to  solve  nonlinear  inverse  medium  and  shape  scattering  problems  in  one  and  two 
dimensions  [20].  For  example,  for  a  ID  inverse  medium  problem  in  which  the  un¬ 
certain  elastic  modulus  of  the  medium  is  discretized  into  64  piecewise  linear  finite 
elements.  Figure  1  indicates  just  0(10^)  samples  are  necessary  to  adequately  sam¬ 
ple  the  (non-Gaussian)  posterior  density,  while  a  reference  publicly-available  (non¬ 
derivative)  MCMC  method  (Delayed  Rejection  Adaptive  Metropolis  (DRAM))  [25]) 
is  nowhere  near  converged  after  even  O(IQ^)  samples.  The  performance  of  unpre¬ 
conditioned  Langevin  MCMC  is  similar  to  that  of  DRAM,  indicating  the  crucial 
role  of  the  Newton  direction  vs.  steepest  descent.  Moreover,  because  the  (inverse) 
Hessian  captures  the  (local)  covariance  structure  of  the  posterior  density,  this  orders- 
of-magnitude  speedup  is  expected  to  become  even  larger  as  the  parameter  dimension 
increases.  Furthermore,  the  stochastic  Newton  method  is  able  to  sample  a  poste- 
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Comparison  of  650  and  10250  MPSRF 


Number  of  Samples 


Figure  2:  Convergence  of  stochastic  Newton  for  1025-dimension  problem  compared  with  65- 
dimension  problem  (1025-dimension  results  based  on  fast  low-rank  implementation),  showing  similar 
rates  of  convergence.  Figure  implies  mesh-independence  (i.e.  dimension-independence)  of  stochastic 
Newton  method  for  this  problem. 


rior  pdf  stemming  from  a  1025-dimensional  problem  (in  which  the  wave  propagation 
medium  is  discretized  into  1024  finite  elements).  Figure  2  compares  the  convergence 
of  the  stochastic  Newton  method  for  the  65-dimensional  and  1025-dimensional  in¬ 
verse  medium  problems.  As  can  be  seen  in  the  figure,  the  convergence  behavior  for 
the  two  problems  is  similar;  in  other  words,  the  convergence  is  independent  of  problem 
size.  This  behavior  is  similar  to  the  well-known  mesh-independeiice  property  of  the 
deterministic  Newton  method. 

The  stochastic  Newton  method  has  also  been  used  to  solve  the  electromagnetic  sta¬ 
tistical  inverse  problem  of  inferring  uncertainty  in  the  shape  of  a  scatterer  from  the 
scattered  wavefield.  Based  on  the  forward  code  from  [26],  we  have  built  a  2D  discon¬ 
tinuous  Galerkin  spectral  element  time-domain  Maxwell  solver,  truncated  with  PMLs 
and  enhanced  with  adjoint-based  shape  gradient/Hessian  capability,  and  numerically 
integrated  in  time  with  a  fourth-order,  five-stage  explicit  Runge  Kutta  scheme.  Fig¬ 
ure  3  compares  the  uncertain  shape  reconstructed  from  scattered  noisy  waveforms, 
with  the  ^‘exact”  shape.  As  can  be  seen  from  the  figure,  the  statistical  solution  to  the 
inverse  problem  (implied  by  grey  shading)  goes  well  beyond  the  deterministic  solution 
(see  blue^  shape),  by  conveying  information  about  the  confidence  we  have  in  the  in¬ 
ferred  shape — taking  into  account  any  prior  knowledge  on  the  shape  and  noise  in  the 
data  and  model.  In  this  case,  the  front  of  the  scatterer  is  identified  with  notably  less 

are  speaking  only  loosely  when  we  equate  the  mean  shape  with  the  deterministic  solution. 
If  the  noise  is  additive  and  Gaussian  and  the  parameter-to-observable  map  is  linear,  then  the  two 
are  equivalent;  otherwise,  they  will  differ,  and  the  differences  will  grow  as  the  map  becomes  more 
nonlinear. 
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Figure  3:  Uncertain  shape  reconstructed  from  noisy  scattered  EM  observations.  Discontinuous 
Galerkin  discretization  with  3rd  order  spectral  elements  in  space  and  4th“Order,  5-stage  explicit 
Runge  Kutta  integration  in  time.  Prior  favors  shape  with  small  boundary.  Shape  is  parametrized  by 
6  cosine  modes,  r{6)  =  Computational  domain  =  {{x^y)  —I  <  x,y  <  1}.  PML 

domain  Qpml  =  ^  1  ^  |y|  ^  2}.  Kite  shape  to  generate  synthetic  observations  given  by 

X  =  0.2  [cos(^)  +  0.65(cos(2^)  -  1)] ,  y  =  0.3sin(^).  Incident  wave  =  cos(8(i-x)),  Hx  =  0,  Hy  =  0 
from  left.  31  observation  points:  x  =  — 0.9, y  =  linspace{— 0.9, 0.9, 31}.  Ez^Hx^Hy  are  observed 
in  0  <  f  <  TT  at  all  time  steps  with  5%  Gaussian  noise.  At  =  10“^,  resulting  in  3324  time  steps. 
Mesh  size  h^nin  =  0.05,  resulting  in  ~  135,000  DOF.  Left:  Snapshot  of  electric  field.  Plane  wave 
incident  from  left,  receivers  located  at  black  dots,  scatterer  shown  in  white.  Right:  Comparison 
of  “exact”  shape  (red)  with  mean  of  reconstructed  shape  (blue)  and  maximum  a  posteriori  shape 
(black),  superposed  on  gray  region  indicating  5%  and  95%  quantiles  of  shape  uncertainty.  Note  that 
uncertainty  in  reconstruction  is  greatest  behind  the  scatterer. 


uncertainty  than  the  back,  and  in  general  the  recovered  shape  has  large  uncertainty, 
stemming  primarily  from  the  limited  observations.  Despite  the  statistical  inverse 
problem’s  having  “only”  six  parameter,  the  size  of  the  forward  problem  (135,000  spa¬ 
tial  unknowns  and  over  3000  time  steps)  make  forward  solution  so  expensive  that  the 
problem  cannot  be  solved  at  all  by  conventional  MCMC. 

The  critical  role  that  inverse  Hessian  preconditioning  plays  is  that  it  brings  to  the 
statistical  sampling  process  the  power  of  Newton-type  deterministic  optimization 
methods — which  for  a  large  class  of  PDE-based  inverse  problems  can  deliver  solu¬ 
tions  at  the  cost  of  a  constant  number  of  forward  solves,  independent  of  problem 
size,  e.g.  [4,5,6,7,9,16,17,19,23].  However,  stochastic  Newton  carries  the  added 
cost  of  having  to  compute  the  deterministic  Newton  step  (i.e.  solve  linear  systems 
having  the  Hessian  as  coefficient  matrix)  as  well  as  find  a  square  root  of  the  inverse 
of  the  Hessian  at  each  sample  point,  as  seen  in  Equation  (4).  Thus  the  cost  of  a 
stochastic  Newton  sample  point  can  be  significantly  more  expensive  than  that  of  a 
non-derivative  MCMC  method.  Explicitly  constructing  the  (formally  dense)  Hessian 
operator  and  factoring  it  is  completely  out  of  the  question  for  large-scale  problems, 
since  this  would  require  a  number  of  forward  solves  equal  to  the  number  of  uncertain 
model  parameters.  However,  the  key  to  the  success  of  Newton  methods  for  determin¬ 
istic  inverse  problems  is  to  recognize  that  for  ill-posed  inverse  problems,  the  Hessian 
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(or  its  Gauss-Newton  approximation)  is  usually  the  sum  of  a  compact  operator  (the 
data  misfit  part,  since  the  data  typically  provide  limited  information  on  the  model 
parameter  fields)  and  a  differential  part  (since  the  regularization/prior  is  typically 
smoothing,  and  its  inverse  is  seen  by  the  Hessian).  Thus,  when  preconditioned  by  the 
prior,  the  Hessian  often  behaves  like  a  compact  perturbation  of  the  identity,  and  fast 
approximations  (e.g.  using  low  rank  [8]  or  multilevel  approximations  [1,5],  or  spec¬ 
tral  representations  [14])  can  be  combined  with  Krylov  methods  to  find  the  action  of 
the  inverse  Hessian  on  a  vector  in  a  mesh-independent  (and  often  small)  number  of 
forward  solves.  Moreover,  these  ideas  can  be  extended  to  rapidly  find  a  square  root 
of  the  inverse  Hessian  [20],  which  is  needed  to  draw  samples  from  the  local  Gaussian 
approximation  of  the  posterior. 

Incorporating  Hessian  information  in  Langevin  dynamics-based  sampling  as  above 
permits  explicit  separation  of  (1)  the  data  misfit  contribution  to  the  posterior,  which 
typically  provides  “sparse”  information,  i.e.  information  on  a  limited  number  of  direc¬ 
tions  in  parameter  space  (a  reflection  of  the  ill-posed  nature  of  the  inverse  problem), 
meaning  that  the  expensive  forward  simulations — needed  to  relate  model  parameters 
to  observables — can  be  limited  to  just  these  directions,  and  (2)  the  prior  contribution 
to  the  posterior,  which  often  provides  “dense”  information  in  parameter  space,  but 
this  information  is  independent  of  the  forward  model,  and  thus  is  cheap  to  evaluate. 
The  typically  small  and  bounded  dimension  of  the  range  space  of  the  data  misfit 
component  of  the  Hessian  thus  plays  a  critical  role  in  dimensionality  reduction,  and 
we  believe  is  key  to  overcoming  the  curse  of  dimensionality  for  PDE-based  statistical 
inverse  problems.  Clearly  an  important  issue  is  the  construction  of  fast  low  rank 
estimates  of  the  data  misfit  portion  of  the  Hessian  for  different  classes  of  inverse  op¬ 
erators,  and  in  particular  for  inverse  wave  propagation.  This  is  discussed  in  Section 
3  for  time  harmonic  inverse  medium  scattering. 

Finally,  all  of  the  important  computational  kernels  of  stochastic  Newton  resemble 
those  of  large-scale  deterministic  Newton-based  inverse  solvers  (notably  the  solution 
of  forward  and  adjoint  state  problems,  and  the  combination  of  their  solutions  to  form 
gradients  and  actions  of  Hessians  on  vectors).  This  permits  the  exploitation  of  highly 
scalable  parallel  data  structures,  algorithms,  and  implementations  that  have  been 
developed  for  deterministic  inverse  problems,  and  have  been  used  to  solve  the  PDE- 
based  inverse  problems  with  up  to  0(10^)  inversion  parameters  on  multi-thousand 
processor  systems  [4, 5,6,7, 23] . 

3.  Fast  Hessian  approximations  for  time-harmonic  inverse  medium  scat¬ 
tering 

As  discussed  above,  the  ability  to  cheaply  approximate  the  Hessian  of  the  data  mis¬ 
fit  functional  is  critical  to  the  success  of  the  stochastic  Newton  method.  Here  we 
describe  FalMS  (Fast  Inverse  Medium  Solver),  a  novel  algorithm  for  the  construc¬ 
tion  of  fast  Hessian  approximations  for  the  low-frequency  Helmholtz  inverse  medium 
problem  with  broadband  and  multi-point  illuminations.  Inverse  medium  problems 
are  encountered  in  acoustic,  elastic,  and  electromagnetic  wave  propagation.  We  use 
a  Lippmanii-Schwinger  formulation,  which  we  discretize  using  a  quadrature  method. 
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Figure  4:  FalMS  is  a  fast  method  for  the  solution  of  a  Born  approximation  inverse  medium 
problem  in  the  low  frequency  regime.  The  medium  perturbation  is  represented  by  a  set 
of  point  scatterers  {red  squares)  in  a  S-D  domain  H.  The  data  consist  of  measurements 
of  scattered  fields  due  to  several  different  incident  fields.  For  example,  in  subfigure  (a) 
we  depict  a  single-source,  single-frequency  illumination,  in  (b)  we  depict  a  multiple-source, 
multiple-frequency  illumination  in  which  we  collect  data  for  each  different  frequency  and 
in  (c)  we  depict  the  case  of  multiple-source  multiple-frequency  illumination.  The  detectors 
can  be  located  in  arbitrary  positions  (c). 


We  consider  small  perturbations  of  the  background  medium. 

If  N^j  is  the  number  of  excitation  frequencies,  Ng  the  number  of  different  source  loca¬ 
tions  for  the  point  illuminations,  Nd  the  number  of  detectors,  and  N  the  scatterer  dis¬ 
cretization  size,  a  dense  singular  value  decomposition  for  the  overall  input-output  map 
(roughly  speaking,  the  square  root  of  the  Hessian)  will  require  0{[min{NsN^Nd,  A^)]^  x 
max{NsN^^Ndj  N))  operations,  without  accounting  for  the  costs  of  solving  the  for¬ 
ward  problem.  We  have  developed  a  fast  SVD  approach  that  brings  the  cost  down 
to  0{NsNujNdN)  thus,  providing  orders  of  magnitude  improvements  over  a  black-box 
dense  SVD.  The  method  is  also  more  robust  and  readily  parallelizable  when  com¬ 
pared  to  Krylov-based  SVD  approaches.  FalMS  builds  on  our  previous  work  on  fast 
Hessian  [1,2,3,15]  approximations  and  can  be  combined  with  multigrid  methods  such 
as  the  ones  developed  in  [1,5].  The  work  is  described  in  detail  in  [22].  Here,  we  give 
some  details  on  the  problem  statement  and  some  representative  results. 

Given  Ng  point  sources  that  generate  NgN^  incident  fields  (spherical  waves)  {u(r;  5,  , 

at  different  frequencies,  we  measure  the  scattered  field  </)(r^;s,a;)  at  Nd  detector 
locations  We  seek  to  recover  the  medium  perturbation  r/(r),  by  solving 

(l>{Vd;s,uj)  =  /  G{rd,r]uj)7]{r)u{r;s,uj)dr,  u  =  Ui, . . .  s  =  I, ....  Ng.  (5) 

Jh 

This  is  a  Born-approximation  Lippmann-Schwinger  scattering  equation,  where  G(*,  •;  a;) 
is  the  Green’s  function  (in  the  reference  medium)  at  frequency  a;,  H  is  the  support  of 
the  medium  perturbation  r/,  and  r  is  a  point  in  H.  This  equation  is  discretized  using 
N  quadrature  points.  The  problem  setup  is  summarized  in  Figure  4. 

One  solution  approach  would  be  to  use  a  dense  SVD  factorization.  However,  such  an 
approach  is  prohibitively  expensive,  as  the  complexity  of  a  dense  SVD  is  [min(  V)]^  x 

mdix{NgN^Nd,  N)  [24].  (For  example,  if  =  10,  Ng  =  100,  Nd  =  10^,  and  N  =  100^ 
we  will  need  over  one  month  of  computation  to  compute  the  SVD  on  a  single-core, 
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Figure  5:  We  consider  a  scatterer  perturbation  with  a  cross-geometry,  with  a  10%  contrast 
with  the  background  medium  and  we  consider  different  scatterer  sizes,  up  to  ten  wave¬ 
lengths.  In  this  experiment,  we  verify  the  accuracy  and  efficiency  of  our  scheme.  We  test 
FalMS  by  comparing  its  reconstruction  (subfigure  A,C)  to  the  one  obtained  by  solving  the 
output  least  squares  inverse  problem  using  the  LSQR  algorithm  (a)  and  (b)  corresponding 
to  the  case  in  which  the  cross  size  is  one  wavelength.  Subfigures  (c)  and  (d)  correspond 
to  the  case  in  which  the  cross  size  is  ten  wavelengths.  In  both  cases,  we  are  able  to  achieve 
excellent  reconstruction  accuracy  using  FalMS  at  considerably  cheaper  cost. 


two-Gigaflops/sec  machine.)  Our  goal  is  to  design  an  algorithm  that  reconstructs  rj 
and  scales  well  with  and  Nd^  for  the  low  frequency  regime.  Our  main 

contribution  is  the  construction  of  an  approximate  singular  value  decomposition  that 
is  valid  for  arbitrary  distributions  of  sources,  detectors  and  frequencies,  as  long  as  the 
detectors  are  well  separated  from  the  support  of  the  scatterer.  Our  construction  is 
based  on  the  following  algorithmic  components: 

•  rank-revealing  randomized  factorization  ideas; 

•  preprocessing  of  the  incident  field  u  using  SVD  to  transform  the  incoming  field 
and  data  and  reduce  the  dimension  of  Ng] 
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•  and  a  novel  recursive  SVD  algorithm  that  can  be  used  to  compute  the  SVD  of 
M  =  [Ml  M2]  given  the  SVDs  of  Mi  and  M2. 

Using  these  components,  we  propose  an  approximate  SVD  factorization  whose  total 
w'ork  complexity  is  OiNsN^^NdN).  The  main  idea  is  to  decompose  the  Hessian  oper¬ 
ator  into  Ng  X  smaller  submatrices  of  size  Nd  x  N  {1  <  u  <  I  <  s  <  Ng). 
We  compute  the  SVD  of  each  small  matrix  by  using  the  randomized  SVD  proposed 
in  [30].  We  apply  a  low  rank  approximation  whenever  possible,  leading  to  a  com¬ 
pression  of  the  matrix  and  a  speed  up  of  the  computations.  Then,  we  recursively 
compute  the  SVD  of  the  Hessian  given  the  SVDs  of  the  smaller  snbmatrices,  using 
a  novel  method  we  have  devised.  The  SVD  provides  a  precise  characterization  of 
the  inverse  problem  since  it  allows  us  to  easily  apply  pseudo-inverse  of  the  Hessian 
to  the  data.  An  example  of  the  accuracy  of  this  approximate  SVD  is  presented  in 
Figure  5.  Fast  and  accurate  Hessian  approximations  such  as  this  play  an  important 
role  in  making  the  Hessian-preconditioned  Langevin  method  of  Section  2  scalable  to 
large  problem  sizes. 
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